The Combined Effectiveness of Unimodular Transformations, Tiling, and Software Prefetching

نویسندگان

  • Rafael H. Saavedra
  • Weihua Mao
  • Daeyeon Park
  • Jacqueline Chame
  • Sungdo Moon
چکیده

Unimodular transformations, tiling, and software prefetching are loop optimizations known to be effective in increasing parallelism, reducing cache miss rates, and eliminating processor stall time. Although these optimizations individually are quite effective, there is the expectation that even better improvements can be obtained by combining them together. In this paper we show that indeed this is the case when unimodular transformations are combined with either tiling or software prefetching. However, our results also show that although combining tiling with prefetching tends to improve the performance of tiling alone, it is also the case that in some situations tiling can degrade the cache performance of software prefetching. The reasons for this unexpected behavior are three fold: 1) tiling introduces interference misses inside the localized space which are difficult to characterize with current techniques based on locality analysis; 2) prefetch predicates are computed using only estimates on the amount of capacity misses, so the latency induced by cache interference is not completely covered; and 3) tiling limits the maximum amount of latency that can be masked with prefetching.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Linear Algebraic View of Loop Transformations and Their Interaction

Although optimizing transformations have been studied for over two decades, the interactions between them is not well understood. This is particularly important for the success of parallelizing compilers. In order to deal with interactions, we view loop transformations as multiplication by a suitable matrix. The transformations considered are loop interchange, permutation, reversal, hyperplane ...

متن کامل

A Linear Algebraic View of Loop

Although optimizing transformations have been studied for over two decades, the interactions between them is not well understood. This is particularly important for the success of parallelizing compilers. In order to deal with interactions, we view loop transformations as multiplication by a suitable matrix. The transformations considered are loop interchange, permutation, reversal, hyperplane ...

متن کامل

The Efficacy of Software Prefetching and Locality Optimizations on Future Memory Systems

Software prefetching and locality optimizations are techniques for overcoming the speed gap between processor and memory. In this paper, we provide a comprehensive summary of current software prefetching and locality optimization techniques, and evaluate the impact of memory trends on the effectiveness of these techniques for three types of applications: regular scientific codes, irregular scie...

متن کامل

Review of A Data Locality Optimizing Algorithm

What problem did the paper address? Who is the intended audience? The big picture problem is how can we improve program performance given the large latency between the processor and memory. The audience is compiler researchers and writers because they are focusing on an existing compilation technique called tiling, which was developed to avoid memory access latency . The paper addresses the pro...

متن کامل

of A Data Locality Optimizing Algorithm

What problem did the paper address? The big picture problem is how can we improve program performance given the large latency between the processor and memory. An approach that has been used in the past is a transformation called tiling. The paper addresses the problem that not all loops are initially tileable. Specifically they answer the question, what combination of loop permutation, skewing...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996